Morphological Generation of German for SMT
نویسندگان
چکیده
We participated in the ACL WMT 2009 shared task for translation of German to English, and English to German. We used the Moses open source system, combined with morphological processing. For German to English, we had the only constraint system comparable with the open-data systems. One of the reasons the system performed well was strong reduction of the German vocabulary, through a simplistic corpusdriven algorithm with minimal linguistic knowledge, which performed aggressive inflection removal and compound splitting. For the English to German task, we submitted a two-step system. In the first step we translated English to the reduced German representation (the same representation which we used as the input to our German to English system). In the second step, we built another Moses system to “translate” the reduced German representation back to normal German through the addition of inflection and merging of split compound words. This two-step system was the worst constraint system submitted (and in fact the only constraint system that differed from all other constraint systems which otherwise scored as a group). Our presentation at the Research Workshop of the Israel Science Foundation on Machine Translation and Morphologically-rich Languages will describe work done as a direct reaction to the poor performance of our English to German system, and to the poor performance of SMT systems in general for this translation task. Our current English to German system for news translation improves by 0.84 BLEU over the baseline and uses sophisticated morphological generation based on SMOR, the University of Stuttgart morphological analyzer/generator of German, and BitPar, a state-of-the-art parser of German also developed in-house.
منابع مشابه
CimS - The CIS and IMS Joint Submission to WMT 2015 addressing morphological and syntactic differences in English to German SMT
We present the CimS submissions to the WMT 2015 Shared Task for the translation direction English to German. Similar to our previous submissions, all of our systems are aware of the complex nominal morphology of German. In this paper, we combine source-side reordering and target-side compound processing with basic morphological processing in order to obtain improved translation results. We also...
متن کاملApplying Morphology Generation Models to Machine Translation
We improve the quality of statistical machine translation (SMT) by applying models that predict word forms from their stems using extensive morphological and syntactic information from both the source and target languages. Our inflection generation models are trained independently of the SMT system. We investigate different ways of combining the inflection prediction component with the SMT syst...
متن کاملA Joint Dependency Model of Morphological and Syntactic Structure for Statistical Machine Translation
When translating between two languages that differ in their degree of morphological synthesis, syntactic structures in one language may be realized as morphological structures in the other, and SMT models need a mechanism to learn such translations. Prior work has used morpheme splitting with flat representations that do not encode the hierarchical structure between morphemes, but this structur...
متن کاملChinese Syntactic Reordering for Adequate Generation of Korean Verbal Phrases in Chinese-to-Korean SMT
Chinese and Korean belong to different language families in terms of word-order and morphological typology. Chinese is an SVO and morphologically poor language while Korean is an SOV and morphologically rich one. In Chinese-to-Korean SMT systems, systematic differences between the verbal systems of the two languages make the generation of Korean verbal phrases difficult. To resolve the difficul...
متن کاملMorpho-Syntactic Analysis for Reordering in Statistical Machine Translation
In the framework of statistical machine translation (SMT), correspondences between the words in the source and the target language are learned from bilingual corpora on the basis of so-called alignment models. Among other things these are meant to capture the differences in word order in different languages. In this paper we show that SMT can take advantage of the explicit introduction of some ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010